IMAGE PROCESSING APPROACH FOR EXTRACTING TABLES FROM SCANNED DOCUMENTS

Aditya Kekare; Atharva Gosavi; Abhishek Jachak; Amit Deshmane

doi:10.17605/OSF.IO/RN8WK

Authors

Aditya Kekare B.E. Student, Dept. of Computer Engineering, NBN Sinhgad School of Engineering, Ambegoan, Pune- 411041, Maharashtra, India
Atharva Gosavi B.E. Student, Dept. of Computer Engineering, NBN Sinhgad School of Engineering, Ambegoan, Pune- 411041, Maharashtra, India
Abhishek Jachak B.E. Student, Dept. of Computer Engineering, NBN Sinhgad School of Engineering, Ambegoan, Pune- 411041, Maharashtra, India
Amit Deshmane Software Architect, bizAmica Software Pvt. Ltd., Pune, Maharashtra, India

DOI:

https://doi.org/10.17605/OSF.IO/RN8WK

Keywords:

Image Processing, Optical Character Recognition

Abstract

Due to data revolution in the 21^st century, processing the ever-increasing volume of documents has become essential. Most of the data in the banking, financial and administrative disciplines is still stored on physical documents. There is a great necessity to process these documents using automation. A majority of useful data in these documents is stored in the form of tables. To maintain the value of data extracted, the data from tables needs to be extracted by maintaining the tabular structure. We have used an image processing approach for extracting these tables and the data contained in them. We perform operations on scanned documents to identify rows and columns of the table. We then extract the textual data using Optical Character recognition from each cell of the table. We used this approach for extracting bordered tables and achieved more than 90% accuracy in extracting the tabular data accurately.

Downloads

Download data is not yet available.

References

Shubham Paliwal, Vishwanath D, Rohit Rahul, Monika Sharma, Lovekesh Vig, “TableNet: Deep Learning model for end-to-end table detection and tabular data extraction for scanned document images”/

https://www.researchgate.net/publication/337242893_TableNet_Deep_Learning_model_for_end-to-end_Table_detection_and_Tabular_data_extraction_from_Scanned_Document_Images

Sebastian Schreiber, Stefan Agne, Ivo Wolf, Andreas Dengel, Sheraz Ahmed, “Deepdesrt: deep-learning for detection and structure recognition of tables in document images”/

https://www.dfki.de/fileadmin/user_upload/import/9672_PID4966073.pdf

Basilios Gatos, Dimitrios Danatsas, Ioannis Pratikakis, Stavros J. Perantonis, “Automatic Table detection in document images”/

https://www.researchgate.net/publication/220781373_Automatic_Table_Detection_in_Document_Images

Minghao Li, Lei Cui, Shaohan Huang, Furu Wei, Ming Zhou, Zhouiun Li, “Tablebank: Table benchmark for image-based table detection and recognition”/

https://arxiv.org/abs/1903.01949

Aditya Kekare, Abhishek Jachak, Atharva Gosavi, P.S. Hanwate, “Techniques for detecting and extracting tabular data from PDFs and scanned documents: A survey”/

https://www.irjet.net/archives/V7/i1/IRJET-V7I178.pdf

S. Deivalakshmi, K. Chaitanya, P. Palanisamy, “Detection of table structure and content extraction from scanned documents”/

https://ieeexplore.ieee.org/abstract/document/694984